104 research outputs found

    HEG-DB: a database of predicted highly expressed genes in prokaryotic complete genomes under translational selection

    Get PDF
    The highly expressed genes database (HEG-DB) is a genomic database that includes the prediction of which genes are highly expressed in prokaryotic complete genomes under strong translational selection. The current version of the database contains general features for almost 200 genomes under translational selection, including the correspondence analysis of the relative synonymous codon usage for all genes, and the analysis of their highly expressed genes. For each genome, the database contains functional and positional information about the predicted group of highly expressed genes. This information can also be accessed using a search engine. Among other statistical parameters, the database also provides the Codon Adaptation Index (CAI) for all of the genes using the codon usage of the highly expressed genes as a reference set. The ‘Pathway Tools Omics Viewer’ from the BioCyc database enables the metabolic capabilities of each genome to be explored, particularly those related to the group of highly expressed genes. The HEG-DB is freely available at http://genomes.urv.cat/HEG-DB

    Stratification of co-evolving genomic groups using ranked phylogenetic profiles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present <it>rank-BLAST</it>, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database.</p> <p>Results</p> <p>The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples.</p> <p>Conclusion</p> <p>Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples.</p

    A Benchmark of Parametric Methods for Horizontal Transfers Detection

    Get PDF
    Horizontal gene transfer (HGT) has appeared to be of importance for prokaryotic species evolution. As a consequence numerous parametric methods, using only the information embedded in the genomes, have been designed to detect HGTs. Numerous reports of incongruencies in results of the different methods applied to the same genomes were published. The use of artificial genomes in which all HGT parameters are controlled allows testing different methods in the same conditions. The results of this benchmark concerning 16 representative parametric methods showed a great variety of efficiencies. Some methods work very poorly whatever the type of HGTs and some depend on the conditions or on the metrics used. The best methods in terms of total errors were those using tetranucleotides as criterion for the window methods or those using codon usage for gene based methods and the Kullback-Leibler divergence metric. Window methods are very sensitive but less specific and detect badly lone isolated gene. On the other hand gene based methods are often very specific but lack of sensitivity. We propose using two methods in combination to get the best of each category, a gene based one for specificity and a window based one for sensitivity

    Identification of Prophages in Bacterial Genomes by Dinucleotide Relative Abundance Difference

    Get PDF
    BACKGROUND: Prophages are integrated viral forms in bacterial genomes that have been found to contribute to interstrain genetic variability. Many virulence-associated genes are reported to be prophage encoded. Present computational methods to detect prophages are either by identifying possible essential proteins such as integrases or by an extension of this technique, which involves identifying a region containing proteins similar to those occurring in prophages. These methods suffer due to the problem of low sequence similarity at the protein level, which suggests that a nucleotide based approach could be useful. METHODOLOGY: Earlier dinucleotide relative abundance (DRA) have been used to identify regions, which deviate from the neighborhood areas, in genomes. We have used the difference in the dinucleotide relative abundance (DRAD) between the bacterial and prophage DNA to aid location of DNA stretches that could be of prophage origin in bacterial genomes. Prophage sequences which deviate from bacterial regions in their dinucleotide frequencies are detected by scanning bacterial genome sequences. The method was validated using a subset of genomes with prophage data from literature reports. A web interface for prophage scan based on this method is available at http://bicmku.in:8082/prophagedb/dra.html. Two hundred bacterial genomes which do not have annotated prophages have been scanned for prophage regions using this method. CONCLUSIONS: The relative dinucleotide distribution difference helps detect prophage regions in genome sequences. The usefulness of this method is seen in the identification of 461 highly probable loci pertaining to prophages which have not been annotated so earlier. This work emphasizes the need to extend the efforts to detect and annotate prophage elements in genome sequences

    TACOA – Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach

    Get PDF
    Diaz NN, Krause L, Goesmann A, Niehaus K, Nattkemper TW. TACOA - Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics. 2009;10(1):56.Background: Metagenomics, or the sequencing and analysis of collective genomes (metagenomes) of microorganisms isolated from an environment, promises direct access to the "unculturable majority". This emerging field offers the potential to lay solid basis on our understanding of the entire living world. However, the taxonomic classification is an essential task in the analysis of metagenomics data sets that it is still far from being solved. We present a novel strategy to predict the taxonomic origin of environmental genomic fragments. The proposed classifier combines the idea of the k-nearest neighbor with strategies from kernel-based learning. Results Our novel strategy was extensively evaluated using the leave-one-out cross validation strategy on fragments of variable length (800 bp – 50 Kbp) from 373 completely sequenced genomes. TACOA is able to classify genomic fragments of length 800 bp and 1 Kbp with high accuracy until rank class. For longer fragments ≥ 3 Kbp accurate predictions are made at even deeper taxonomic ranks (order and genus). Remarkably, TACOA also produces reliable results when the taxonomic origin of a fragment is not represented in the reference set, thus classifying such fragments to its known broader taxonomic class or simply as "unknown". We compared the classification accuracy of TACOA with the latest intrinsic classifier PhyloPythia using 63 recently published complete genomes. For fragments of length 800 bp and 1 Kbp the overall accuracy of TACOA is higher than that obtained by PhyloPythia at all taxonomic ranks. For all fragment lengths, both methods achieved comparable high specificity results up to rank class and low false negative rates are also obtained. Conclusion: An accurate multi-class taxonomic classifier was developed for environmental genomic fragments. TACOA can predict with high reliability the taxonomic origin of genomic fragments as short as 800 bp. The proposed method is transparent, fast, accurate and the reference set can be easily updated as newly sequenced genomes become available. Moreover, the method demonstrated to be competitive when compared to the most current classifier PhyloPythia and has the advantage that it can be locally installed and the reference set can be kept up-to-date. Background

    A Transcriptional “Scream” Early Response of E. coli Prey to Predatory Invasion by Bdellovibrio

    Get PDF
    We have transcriptionally profiled the genes differentially expressed in E. coli prey cells when predatorily attacked by Bdellovibrio bacteriovorus just prior to prey cell killing. This is a brief, approximately 20–25 min period when the prey cell is still alive but contains a Bdellovibrio cell in its periplasm or attached to and penetrating its outer membrane. Total RNA was harvested and labelled 15 min after initiating a semi-synchronous infection with an excess of Bdellovibrio preying upon E. coli and hybridised to a macroarray spotted with all predicted ORFs of E. coli. SAM analysis and t-tests were performed on the resulting data and 126 E. coli genes were found to be significantly differentially regulated by the prey upon attack by Bdellovibrio. The results were confirmed by QRT-PCR. Amongst the prey genes upregulated were a variety of general stress response genes, potentially “selfish” genes within or near prophages and transposable elements, and genes responding to damage in the periplasm and osmotic stress. Essentially, the presence of the invading Bdellovibrio and the resulting damage to the prey cell elicited a small “transcriptional scream”, but seemingly no specific defensive mechanism with which to counter the Bdellovibrio attack. This supports other studies which do not find Bdellovibrio resistance responses in prey, and bodes well for its use as a “living antibiotic”

    Elusive Origins of the Extra Genes in Aspergillus oryzae

    Get PDF
    The genome sequence of Aspergillus oryzae revealed unexpectedly that this species has approximately 20% more genes than its congeneric species A. nidulans and A. fumigatus. Where did these extra genes come from? Here, we evaluate several possible causes of the elevated gene number. Many gene families are expanded in A. oryzae relative to A. nidulans and A. fumigatus, but we find no evidence of ancient whole-genome duplication or other segmental duplications, either in A. oryzae or in the common ancestor of the genus Aspergillus. We show that the presence of divergent pairs of paralogs is a feature peculiar to A. oryzae and is not shared with A. nidulans or A. fumigatus. In phylogenetic trees that include paralog pairs from A. oryzae, we frequently find that one of the genes in a pair from A. oryzae has the expected orthologous relationship with A. nidulans, A. fumigatus and other species in the subphylum Eurotiomycetes, whereas the other A. oryzae gene falls outside this clade but still within the Ascomycota. We identified 456 such gene pairs in A. oryzae. Further phylogenetic analysis did not however indicate a single consistent evolutionary origin for the divergent members of these pairs. Approximately one-third of them showed phylogenies that are suggestive of horizontal gene transfer (HGT) from Sordariomycete species, and these genes are closer together in the A. oryzae genome than expected by chance, but no unique Sordariomycete donor species was identifiable. The postulated HGTs from Sordariomycetes still leave the majority of extra A. oryzae genes unaccounted for. One possible explanation for our observations is that A. oryzae might have been the recipient of many separate HGT events from diverse donors

    Investigating the metabolic capabilities of Mycobacterium tuberculosis H37Rv using the in silico strain iNJ661 and proposing alternative drug targets

    Get PDF
    <p>Abstract</p> <p>Background:</p> <p><it>Mycobacterium tuberculosis </it>continues to be a major pathogen in the third world, killing almost 2 million people a year by the most recent estimates. Even in industrialized countries, the emergence of multi-drug resistant (MDR) strains of tuberculosis hails the need to develop additional medications for treatment. Many of the drugs used for treatment of tuberculosis target metabolic enzymes. Genome-scale models can be used for analysis, discovery, and as hypothesis generating tools, which will hopefully assist the rational drug development process. These models need to be able to assimilate data from large datasets and analyze them.</p> <p>Results:</p> <p>We completed a bottom up reconstruction of the metabolic network of <it>Mycobacterium tuberculosis </it>H37Rv. This functional <it>in silico </it>bacterium, <it>iNJ</it>661, contains 661 genes and 939 reactions and can produce many of the complex compounds characteristic to tuberculosis, such as mycolic acids and mycocerosates. We grew this bacterium <it>in silico </it>on various media, analyzed the model in the context of multiple high-throughput data sets, and finally we analyzed the network in an 'unbiased' manner by calculating the Hard Coupled Reaction (HCR) sets, groups of reactions that are forced to operate in unison due to mass conservation and connectivity constraints.</p> <p>Conclusion:</p> <p>Although we observed growth rates comparable to experimental observations (doubling times ranging from about 12 to 24 hours) in different media, comparisons of gene essentiality with experimental data were less encouraging (generally about 55%). The reasons for the often conflicting results were multi-fold, including gene expression variability under different conditions and lack of complete biological knowledge. Some of the inconsistencies between <it>in vitro </it>and <it>in silico </it>or <it>in vivo </it>and <it>in silico </it>results highlight specific loci that are worth further experimental investigations. Finally, by considering the HCR sets in the context of known drug targets for tuberculosis treatment we proposed new alternative, but equivalent drug targets.</p

    Enzymatic degradation of granular potato starch by Microbacterium aurum strain B8.A

    Get PDF
    Microbacterium aurum strain B8.A was isolated from the sludge of a potato starch-processing factory on the basis of its ability to use granular starch as carbon- and energy source. Extracellular enzymes hydrolyzing granular starch were detected in the growth medium of M. aurum B8.A, while the type strain M. aurum DSMZ 8600 produced very little amylase activity, and hence was unable to degrade granular starch. The strain B8.A extracellular enzyme fraction degraded wheat, tapioca and potato starch at 37 °C, well below the gelatinization temperature of these starches. Starch granules of potato were hydrolyzed more slowly than of wheat and tapioca, probably due to structural differences and/or surface area effects. Partial hydrolysis of starch granules by extracellular enzymes of strain B8.A resulted in large holes of irregular sizes in case of wheat and tapioca and many smaller pores of relatively homogeneous size in case of potato. The strain B8.A extracellular amylolytic system produced mainly maltotriose and maltose from both granular and soluble starch substrates; also, larger maltooligosaccharides were formed after growth of strain B8.A in rich medium. Zymogram analysis confirmed that a different set of amylolytic enzymes was present depending on the growth conditions of M. aurum B8.A. Some of these enzymes could be partly purified by binding to starch granules

    Bacterial endosymbiont Cardinium cSfur genome sequence provides insights for understanding the symbiotic relationship in Sogatella furcifera host

    Get PDF
    Background: Sogatella furcifera is a migratory pest that damages rice plants and causes severe economic losses. Due to its ability to annually migrate long distances, S.furcifera has emerged as a major pest of rice in several Asian countries. Symbiotic relationships of inherited bacteria with terrestrial arthropods have significant implications. The genus Cardinium is present in many types of arthropods, where it influences some host characteristics. We present a report of a newly # identified strain of the bacterial endosymbiont Cardinium cSfur in S. furcifera. Result: From the whole genome of S. furcifera previously sequenced by our laboratory, we assembled the whole genome sequence of Cardinium cSfur. The sequence comprised 1,103,593 bp with a GC content of 39.2%. The phylogenetic tree of the Bacteroides phylum to which Cardinium cSfur belongs suggests that Cardinium cSfur is closely related to the other strains (Cardinium cBtQ1 and cEper1) that are members of the Amoebophilaceae family. Genome comparison between the host-dependent endosymbiont including Cardinium cSfur and freeliving bacteria revealed that the endosymbiont has a smaller genome size and lower GC content, and has lost some genes related to metabolism because of its special environment, which is similar to the genome pattern observed in other insect symbionts. Cardinium cSfur has limited metabolic capability, which makes it less contributive to metabolic and biosynthetic processes in its host. From our findings, we inferred that, to compensate for its limited metabolic capability, Cardinium cSfur harbors a relatively high proportion of transport proteins, which might act as the hub between it and its host. With its acquisition of the whole operon related to biotin synthesis and glycolysis related genes through HGT event, Cardinium cSfur seems to be undergoing changes while establishing a symbiotic relationship with its host. Conclusion: A novel bacterial endosymbiont strain (Cardinium cSfur) has been discovered. A genomic analysis of the endosymbiont in S. furcifera suggests that its genome has undergone certain changes to facilitate its settlement in the host. The envisaged potential reproduction manipulative ability of the new endosymbiont strain in its S. furcifera host has vital implications in designing eco-friendly approaches to combat the insect pest
    corecore